20 research outputs found

    Advances in Automatic Keyphrase Extraction

    Get PDF
    The main purpose of this thesis is to analyze and propose new improvements in the field of Automatic Keyphrase Extraction, i.e., the field of automatically detecting the key concepts in a document. We will discuss, in particular, supervised machine learning algorithms for keyphrase extraction, by first identifying their shortcomings and then proposing new techniques which exploit contextual information to overcome them. Keyphrase extraction requires that the key concepts, or \emph{keyphrases}, appear verbatim in the body of the document. We will identify the fact that current algorithms do not use contextual information when detecting keyphrases as one of the main shortcomings of supervised keyphrase extraction. Instead, statistical and positional cues, like the frequency of the candidate keyphrase or its first appearance in the document, are mainly used to determine if a phrase appearing in a document is a keyphrase or not. For this reason, we will prove that a supervised keyphrase extraction algorithm, by using only statistical and positional features, is actually able to extract good keyphrases from documents written in languages that it has never seen. The algorithm will be trained over a common dataset for the English language, a purpose-collected dataset for the Arabic language, and evaluated on the Italian, Romanian and Portuguese languages as well. This result is then used as a starting point to develop new algorithms that use contextual information to increase the performance in automatic keyphrase extraction. The first algorithm that we present uses new linguistics features based on anaphora resolution, which is a field of natural language processing that exploits the relations between elements of the discourse as, e.g., pronouns. We evaluate several supervised AKE pipelines based on these features on the well-known SEMEVAL 2010 dataset, and we show that the performance increases when we add such features to a model that employs statistical and positional knowledge only. Finally, we investigate the possibilities offered by the field of Deep Learning, by proposing six different deep neural networks that perform automatic keyphrase extraction. Such networks are based on bidirectional long-short term memory networks, or on convolutional neural networks, or on a combination of both of them, and on a neural language model which creates a vector representation of each word of the document. These networks are able to learn new features using the the whole document when extracting keyphrases, and they have the advantage of not needing a corpus after being trained to extract keyphrases from new documents. We show that with deep learning based architectures we are able to outperform several other keyphrase extraction algorithms, both supervised and not supervised, used in literature and that the best performances are obtained when we build an additional neural representation of the input document and we append it to the neural language model. Both the anaphora-based and the deep-learning based approaches show that using contextual information, the performance in supervised algorithms for automatic keyphrase extraction improves. In fact, in the methods presented in this thesis, the algorithms which obtained the best performance are the ones receiving more contextual information, both about the relations of the potential keyphrase with other parts of the document, as in the anaphora based approach, and in the shape of a neural representation of the input document, as in the deep learning approach. In contrast, the approach of using statistical and positional knowledge only allows the building of language agnostic keyphrase extraction algorithms, at the cost of decreased precision and recall

    Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

    Get PDF
    In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new sets of features by adding combinations of the linguistic features we propose and we evaluate the new performance of the system. We also use anaphora and coreference resolution to transform the documents, trying to simulate the cohesion process performed by the human mind. We found that our approach has a slightly positive impact on the performance of automatic keyphrase extraction, in particular when considering the ranking of the results

    Influence of the specimen production and preparation on the compressive strength and the fatigue resistance of HPC and UHPC

    Get PDF
    The results of tests under monotonically increasing load and cyclic compression load are often analysed by means of probabilistic methods. Although there is a considerable scattering in the results, especially in the number of cycles to failure, the cause of these cannot be completely explained. The imperfections of the specimens tested are among the causes of this scattering mentioned in the literature. Based on a round robin test the influence of HPC and UHPC production and specimen preparation techniques on the mean values of the compressive strengths, number of cycles to failure and data scattering have been evaluated. The main findings of the study are that the production techniques have an influence on the compressive strength, however, do not affect the mean number of cycles to failure. Moreover, the accurate preparation of the specimens has a positive influence on the compressive strength and the scattering of the results of both compression and cyclic load tests. The mean number of cycles to failure of HPC specimens is not influenced by the preparation techniques, whereas the polishing technique may have a positive influence on the mean number of cycles to failure of UHPC specimens. © 2021, The Author(s)

    Entity recognition in the biomedical domain using a hybrid approach.

    Get PDF
    BACKGROUND: This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. METHOD: The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a machine-learning classifier. First, the OGER entity recognizer, which has a bias towards high recall, annotates the terms that appear in selected domain ontologies. Subsequently, the Distiller framework uses this information as a feature for a machine learning algorithm to select the relevant entities only. For this step, we compare two different supervised machine-learning algorithms: Conditional Random Fields and Neural Networks. RESULTS: In an in-domain evaluation using the CRAFT corpus, we test the performance of the combined systems when recognizing chemicals, cell types, cellular components, biological processes, molecular functions, organisms, proteins, and biological sequences. Our best system combines dictionary-based candidate generation with Neural-Network-based filtering. It achieves an overall precision of 86% at a recall of 60% on the named entity recognition task, and a precision of 51% at a recall of 49% on the concept recognition task. CONCLUSION: These results are to our knowledge the best reported so far in this particular task

    Long-term motor deficit in brain tumour surgery with preserved intra-operative motor-evoked potentials

    Get PDF
    Muscle motor-evoked potentials are commonly monitored during brain tumour surgery in motor areas, as these are assumed to reflect the integrity of descending motor pathways, including the corticospinal tract. However, while the loss of muscle motor-evoked potentials at the end of surgery is associated with long-term motor deficits (muscle motor-evoked potential-related deficits), there is increasing evidence that motor deficit can occur despite no change in muscle motor-evoked potentials (muscle motor-evoked potential-unrelated deficits), particularly after surgery of non-primary regions involved in motor control. In this study, we aimed to investigate the incidence of muscle motor-evoked potential-unrelated deficits and to identify the associated brain regions. We retrospectively reviewed 125 consecutive patients who underwent surgery for peri-Rolandic lesions using intra-operative neurophysiological monitoring. Intraoperative changes in muscle motor-evoked potentials were correlated with motor outcome, assessed by the Medical Research Council scale. We performed voxel-lesion-symptom mapping to identify which resected regions were associated with short- and long-term muscle motor-evoked potential-associated motor deficits. Muscle motor-evoked potentials reductions significantly predicted long-term motor deficits. However, in more than half of the patients who experienced long-term deficits (12/22 patients), no muscle motor-evoked potential reduction was reported during surgery. Lesion analysis showed that muscle motor-evoked potential-related long-term motor deficits were associated with direct or ischaemic damage to the corticospinal tract, whereas muscle motor-evoked potential-unrelated deficits occurred when supplementary motor areas were resected in conjunction with dorsal premotor regions and the anterior cingulate. Our results indicate that long-term motor deficits unrelated to the corticospinal tract can occur more often than currently reported. As these deficits cannot be predicted by muscle motor-evoked potentials, a combination of awake and/or novel asleep techniques other than muscle motor-evoked potentials monitoring should be implemented

    Efficient and Accurate Entity Recognition for Biomedical Text

    Full text link
    This short paper briefly presents an efficient implementation of a named entity recognition system for biomedical entities, which is also available as a web service. The approach is based on a dictionary-based entity recognizer combined with a machine-learning classifier which acts as a filter. We evaluated the efficiency of the approach through participation in the TIPS challenge (BioCreative V.5), where it obtained the best results among participating systems. We separately evaluated the quality of entity recognition and linking, using a manually annotated corpus as a reference (CRAFT), where we obtained state-of-the-art results
    corecore